HDIdx: High-dimensional indexing for efficient approximate nearest neighbor search
نویسندگان
چکیده
Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present “HDIdx”, an efficient high-dimensional indexing library for fast approximate NN search, which is open-source and written in Python. It offers a family of state-of-the-art algorithms that convert input high-dimensional vectors into compact binary codes, making them very efficient and scalable for NN search with very low space complexity.
منابع مشابه
Metric-Based Shape Retrieval in Large Databases
This paper examines the problem of database organization and retrieval based on computing metric pairwise distances. A low-dimensional Euclidean approximation of a high-dimensional metric space is not efficient, while search in a high-dimensional Euclidean space suffers from the “curse of dimensionality”. Thus, techniques designed for searching metric spaces must be used. We evaluate several su...
متن کاملImproving Bilayer Product Quantization for Billion-Scale Approximate Nearest Neighbors in High Dimensions
The top-performing systems for billion-scale high-dimensional approximate nearest neighbor (ANN) search are all based on two-layer architectures that include an indexing structure and a compressed datapoints layer. An indexing structure is crucial as it allows to avoid exhaustive search, while the lossy data compression is needed to fit the dataset into RAM. Several of the most successful syste...
متن کاملA New Approach to Indexing in High-Dimensional Space
SVI is a promising new scheme for indexing high-dimensional points and vectors for use in vector retrieval and for nding the k-nearest neighbours. SVI performs an approximate search; that is, it trades oo the completeness of the search for speed. The indexing scheme is built around a rule that was found by applying data mining techniques to sets of random vectors. This approach could well lead ...
متن کاملIndexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space
ÐSimilarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of highdimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor se...
متن کاملAn efficient nearest neighbor search in high-dimensional data spaces
Similarity search in multimedia databases requires an efficient support of nearest neighbor search on a large set of high-dimensional points. A technique applied for similarity search in multimedia databases is to transform important properties of the multimedia objects into points of a high-dimensional feature space. The feature space is usually indexed using a multidimensional index structure...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neurocomputing
دوره 237 شماره
صفحات -
تاریخ انتشار 2017